Flexibly Mining Better Subgroups
نویسندگان
چکیده
In subgroup discovery, also known as supervised pattern mining, discovering high quality one-dimensional subgroups and refinements of these is a crucial task. For nominal attributes, this is relatively straightforward, as we can consider individual attribute values as binary features. For numerical attributes, the task is more challenging as individual numeric values are not reliable statistics. Instead, we can consider combinations of adjacent values, i.e. bins. Existing binning strategies, however, are not tailored for subgroup discovery. That is, they do not directly optimize for the quality of subgroups, therewith potentially degrading the mining result. To address this issue, we propose FLEXI. In short, with FLEXI we propose to use optimal binning to find high quality binary features for both numeric and ordinal attributes. We instantiate FLEXI with various quality measures and show how to achieve efficiency accordingly. Experiments on both synthetic and realworld data sets show that FLEXI outperforms state of the art with up to 25 times improvement in subgroup quality.
منابع مشابه
A Simple Method for Heuristic Modeling of Expert Knowledge in Chronic Disease: Identification of Prognostic Subgroups in Rheumatology
Identification of prognostic subgroups is of key clinical interest at the early stages of chronic disease. The aim of this study is to examine whether representation of physicians' expert knowledge in a simple heuristic model can improve data mining methods in prognostic assessments of patients with rheumatoid arthritis (RA). Five rheumatology consultants' experiences of clinical data patterns ...
متن کاملNovel Algorithm of Spatiotemporal Association Rules Mining Based on Event-cov- erage
In order to eliminate data redundancy of spatiotemporal database, and flexibly create spatiotemporal association patterns, and fast discover spatiotemporal association rules, firstly, this paper adopts event-coverage to create spatiotemporal mining database; the method can divide the spatiotemporal domain into some spatiotemporal transaction cells, where each cell is made of attribute values an...
متن کاملFunctional Brain Imaging with Multi-objective Multi-modal Evolutionary Optimization
Functional brain imaging is a source of spatio-temporal data mining problems. A new framework hybridizing multi-objective and multimodal optimization is proposed to formalize these data mining problems, and addressed through Evolutionary Computation (EC). The merits of EC for spatio-temporal data mining are demonstrated as the approach facilitates the modelling of the experts’ requirements, and...
متن کاملDeviation analysis
The two general data analytic questions of subgroup mining (B2.2) deal with deviations and associations (C5.2.3, C5.2.4). A deviation pattern describes a deviating behavior (distribution) of a target variable in a subgroup. Target variable and behavior type are selected by the analyst for an individual mining task, the deviating subgroups are determined by the mining method. Deviation patterns ...
متن کاملIdiopathic Constipation can be Subdivided in Clinical Subtypes: Data Mining by Cluster Analysis on a Population based Study
The prevalence of non organic constipation differs from country to country and the reliability of the estimate rates is uncertain. Moreover, the clinical relevance of subdividing the heterogeneous functional constipation disorders into pre-defined subgroups is largely unknown.. Aim: to estimate the prevalence of constipation in a population-based sample and determine whether clinical subgroups ...
متن کامل